We compute per-cluster differential expression with Scanpy’s tl.rank_genes_groups using
three statistics (wilcoxon, t-test, logreg). For each method we contrast the selected
leiden cluster against the rest, keep genes with adjusted p < 0.05 (Bonferroni), and display the
top hits by the method’s score (up to 50 per cluster). For readability, known aliases are appended to symbols
in the table (display only).
To make marker calls more specific to local micro-populations, we compare each cluster against its
K most-connected neighboring clusters on the KNN graph (default K = 3). We build a cluster-by-cluster
connectivity matrix by summing cell-level graph weights (obsp['connectivities']), pick the top-K
neighbors per cluster, and run rank_genes_groups (wilcoxon) with the target cluster as the
group and the pooled neighbors as the reference. Results are filtered at adj. p ≤ 0.05
and ranked by |logFC|; we keep the top 50 per cluster. If a neighbor graph is missing, we compute PCA and
pp.neighbors on the fly, or gracefully fall back to cluster-vs-rest if too few clusters exist.
We score cell-type signatures with decoupler’s ULM method against a curated
PanglaoDB marker resource (human entries, canonical markers only, sensitivity > 0.5,
duplicates removed). ULM scores are computed per cell (dc.mt.ulm, tmin=3),
then summarized per cluster using rankby_group (t-test with overestimated variance);
only positive-stat entries are kept. The “Predicted annotation” tab shows (i) a UMAP colored by per-cell
enrichment for the selected cell type and (ii) per-cluster score distributions. We list the top 5 candidate
cell types per cluster as suggestions, not hard labels.
TF activity is estimated with decoupler ULM using the human CollecTRI regulon network. We compute per-cell TF scores, then rank TFs per cluster (same statistics as above) and retain the top 5 TFs as “program markers.” The TF tab shows a UMAP of the selected TF’s activity and a violin panel with cluster-wise distributions.
Co-expression: given Gene A and Gene B, cells are colored either (a) in bivariate mode using independent quantile thresholds for A and B (A-high, B-high, both-high, low) or (b) in ratio mode using A/B (or log2(A/B)). Controls include per-gene clipping to upper quantiles, binarization, a global scale option, and an Otsu-based auto-threshold “Suggest” button for each gene. An auxiliary table lists cells with the most extreme A/B ratios.
Gene-set scoring: the “Signature” panel computes a per-cell mean Z-score across the pasted gene list (each gene is Z-scored across cells; the signature score is their mean). The resulting score is rendered on UMAP and also registered as a virtual gene so it can be reused elsewhere in the dashboard.
If enabled at build time (“Generate biology insights”), we produce cluster-wise narrative summaries with a large language model (Gemini family with automatic fallbacks). Only cluster-level summaries are provided to the model: cluster sizes; neighbor-aware top markers; method-specific marker tables; the per-cluster PanglaoDB enrichment ranks; and the top TFs. Raw counts or per-cell expression matrices are not sent. The model returns concise Markdown with a proposed label (± a refined subtype), confidence (High/Medium/Low), and supporting markers. Treat AI text as suggestive and validate against the Markers, Predicted annotation, and TF tabs.
Rendering notes: expression vectors are quantized for sparsity-aware transport and colored with a
grey→rainbow palette; QC and composition views reflect the currently selected layer.
Interactive data mining and analysis of bulk and single-cell expression data — delivered as a fast, shareable HTML dashboard with UMAPs, marker discovery, enrichment, TF programs, and co-expression exploration.
Single Cell MultiOmics Lab
If you use STREAM in your work, please cite: STREAM (v2.0) — Streamlined Toolkit for Real-time Exploratory Analysis of Multiomics, generated 2025-08-19T13:16:19.
Disclaimer. STREAM is a research and education tool. It is not intended for clinical, diagnostic, or patient-management decisions.
MIT License Copyright (c) 2025 Single Cell MultiOmics Lab / STREAM Authors Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.